Path Model Anomaly Detection with Bounding Boxes for Generalization Matt Mahoney May 1, 2004 In prior work (http://cs.fit.edu/~mmahoney/nasa/msg2.txt) path modeling was compared with Gecko on the Marotta voltage test 1 data to detect voltage anomalies. In path modeling, a training time series and its first two derivatives (each smoothed twice) traces a path through a 3-D phase space, scaled to a unit cube. A test signal is assigned an anomaly score equal to the square of the Euclidean distance from the closest point on the training path. Previously, if there were multiple training series, then they were simply concatenated. The result is that the anomaly score is the distance to the nearest training path. This method has the disadvantage of being unable to generalize to allow test points between normal values. For example, given two traces at 20V and 24V, we want 22V to receive a lower anomaly score than 18V or 26V (since it is between two normal values), but this is not the case. They would all receive equal scores since they are all 2V from the nearest training value. In this work, TSAD4 was modified to generalize across multiple training paths to allow values between them. It assigns an anomaly score equal to the square of the distance from the test point to the smallest box enclosing the nearest points on each of the training paths (fig. 1). For a single path, this reduces to the old algorithm. This modification generalizes well when trained on series representing the bounds of normal behavior. / / / / / / P1+-----+ / /|+P2 | / / | x |/ / /+-----+P3 / / / / / / Fig. 1. Path modeling with bounding boxes. The closest points to test point x on the three paths are P1, P2, and P3. The anomaly score is the square of the distance from x to the smallest box that encloses all points, in this case 0 because x is inside the box. Without generalization, the distance to the closest point (P3) would be used instead. Experimental Results - Voltage Generalization TSAD4 v5 was trained on 2 or 3 traces from the set "Voltage Test 1" consisting of normal traces at 14 to 32 volts in increments of 2 volts. No hot or plunger-impeded traces were used. Parameters were set as follows: N=20000 (number of training points in 1 trace) C=2 (column 2, Hall effect sensor) T=50.00 (Smoothing filter time constant of 50 samples) K=50 (Number of vertices in piecewise approximation per path) M=3 (Number of dimensions, x, dx, and ddx) R=3 (Fast: test previous, current, next 2, and random segment) S=1 (No subsampling) P=2 or 3 (number of training paths (new parameter added to TSAD4 v5)) For comparison, path modeling without generaliation, (N=40000 or 60000, P=1) and Gecko results (from http://cs.fit.edu/~mmahoney/nasa/msg2.txt) are shown again. As can be seen, unseen test voltages that are between the training voltages receive low anomaly scores using bounding boxes, showing good generalization. Key: File - Marotta valve current, 20,000 samples over 2 seconds. Tr - A + indicates that the test file was used in training. Nearest path - previous results, repeated here. Bounding box - new algorithm described above. Max - Highest anomaly score for a single point (after smoothing with T) Total - Sum of anomaly score over all 20,000 points. Gecko - Previous results reported by Stan Salvador: P = correctly passes at transition threshold 3, error threshold 10 or 20 P/F = fails at error threshold 10, pass at 20 (blank) correctly fails at error threshold 10 or 20 - = incorrectly fails between training traces (no generalization) * = incorrectly fails on self Nearest path Bounding box File Tr Max Total Gecko Max Total ------------------- -- -------- ------------ ----- -------- ----------- V37898 V14 T21 R00s 0.030593 38.317706 0.009964 53.844155 V37898 V16 T21 R00s + 0.001563 0.835646 P 0.000106 0.112518 V37898 V18 T21 R00s 0.010952 54.399169 - 0.009590 1.664224 V37898 V20 T21 R00s + 0.000598 0.579976 P 0.000272 0.169250 V37898 V22 T21 R00s 0.013402 81.886032 0.021423 134.475311 V37898 V24 T21 R00s 0.052788 325.975681 0.092044 530.363351 V37898 V26 T21 R00s 0.115233 741.644852 0.200391 1198.311232 V37898 V28 T21 R00s 0.208292 1332.898601 0.364715 2146.486105 V37898 V30 T21 R00s 0.331121 2097.481117 0.579814 3372.326634 V37898 V32 T21 R00s 0.489472 3072.928435 0.839131 4940.186295 V37898 V14 T21 R00s 0.052124 83.233466 0.034927 71.044993 V37898 V16 T21 R00s 0.029287 56.521395 0.029274 47.613592 V37898 V18 T21 R00s 0.009376 35.379515 0.009864 41.925795 V37898 V20 T21 R00s + 0.000693 0.469590 P 0.000095 0.332452 V37898 V22 T21 R00s 0.007569 30.223044 P 0.007085 1.377781 V37898 V24 T21 R00s + 0.000762 0.342283 P 0.000067 0.146266 V37898 V26 T21 R00s 0.011394 56.321024 0.013736 81.974519 V37898 V28 T21 R00s 0.040074 228.117288 0.055281 332.454806 V37898 V30 T21 R00s 0.086912 516.771926 0.125402 754.104574 V37898 V32 T21 R00s 0.156792 938.623295 0.229274 1370.782604 V37898 V14 T21 R00s 0.049729 302.023024 0.059606 322.937925 V37898 V16 T21 R00s 0.060735 225.911067 0.080612 276.438166 V37898 V18 T21 R00s 0.049857 193.155379 0.039809 65.403405 V37898 V20 T21 R00s 0.018441 81.915891 0.020038 109.079059 V37898 V22 T21 R00s 0.005594 19.553865 0.006784 27.407847 V37898 V24 T21 R00s + 0.000433 0.286240 P 0.000078 0.051018 V37898 V26 T21 R00s 0.005845 21.080720 P 0.001430 0.195225 V37898 V28 T21 R00s + 0.000871 3.375865 P 0.000220 0.121461 V37898 V30 T21 R00s 0.008769 64.516314 P/F 0.009739 58.575805 V37898 V32 T21 R00s 0.031792 214.843551 0.040028 239.787592 V37898 V14 T21 R00s 0.048617 316.403843 0.043642 14.715687 V37898 V16 T21 R00s 0.060602 312.823472 0.074147 373.740922 V37898 V18 T21 R00s 0.065431 327.778512 0.064213 363.850035 V37898 V20 T21 R00s 0.042118 298.065245 0.037219 44.392886 V37898 V22 T21 R00s 0.026220 173.077767 0.032921 190.536967 V37898 V24 T21 R00s 0.017012 79.791156 0.020614 80.459667 V37898 V26 T21 R00s 0.004287 20.523354 0.005589 17.828933 V37898 V28 T21 R00s + 0.001408 0.442025 P 0.000029 0.021477 V37898 V30 T21 R00s 0.004731 16.039355 P 0.000403 0.068163 V37898 V32 T21 R00s + 0.000214 0.264199 P 0.000065 0.052735 V37898 V14 T21 R00s + 0.000185 0.247034 * 0.000156 0.616784 V37898 V16 T21 R00s 0.034172 63.798102 - 0.023824 36.866584 V37898 V18 T21 R00s 0.011743 41.175197 - 0.014513 3.436917 V37898 V20 T21 R00s + 0.000359 0.292652 P 0.000459 0.767665 V37898 V22 T21 R00s 0.013217 84.754241 0.040726 176.161268 V37898 V24 T21 R00s 0.054600 333.057361 0.177935 703.398975 V37898 V26 T21 R00s 0.117320 751.739438 0.387884 1600.870965 V37898 V28 T21 R00s 0.211603 1345.931139 0.710516 2878.605524 V37898 V30 T21 R00s 0.334565 2113.216088 1.134691 4527.727721 V37898 V32 T21 R00s 0.488202 3094.807931 1.642023 6640.007829 V37898 V14 T21 R00s 0.041018 58.254755 0.038114 65.702743 V37898 V16 T21 R00s 0.021778 43.696323 0.020642 43.959633 V37898 V18 T21 R00s 0.006596 26.814669 0.009022 40.379409 V37898 V20 T21 R00s + 0.000913 0.705107 P 0.000095 0.328552 V37898 V22 T21 R00s 0.008819 48.095410 P/F 0.008506 2.227060 V37898 V24 T21 R00s 0.006635 23.487464 P 0.002434 0.730332 V37898 V26 T21 R00s + 0.000361 0.593473 P 0.000059 0.089577 V37898 V28 T21 R00s 0.009032 48.236476 0.014566 80.971810 V37898 V30 T21 R00s 0.033475 194.134671 0.057424 331.268337 V37898 V32 T21 R00s 0.076193 448.467580 0.132270 768.896982 V37898 V14 T21 R00s 0.041215 274.491945 0.046988 16.331399 V37898 V16 T21 R00s 0.051808 266.882544 0.076239 298.376145 V37898 V18 T21 R00s 0.041913 228.307480 0.047035 322.737594 V37898 V20 T21 R00s 0.025232 144.553579 0.030307 42.008034 V37898 V22 T21 R00s 0.015416 66.157461 0.021575 97.356870 V37898 V24 T21 R00s 0.004027 16.968082 0.007241 21.178022 V37898 V26 T21 R00s + 0.000576 0.324425 P 0.000057 0.043899 V37898 V28 T21 R00s 0.005996 32.410841 P/F 0.004943 0.674795 V37898 V30 T21 R00s 0.004433 16.733599 P/F 0.001124 0.174122 V37898 V32 T21 R00s + 0.002088 0.508056 P 0.000294 0.087124 V37898 V14 T21 R00s 0.007021 20.984110 0.009707 52.153969 V37898 V16 T21 R00s + 0.003341 1.525242 * 0.000051 0.055932 V37898 V18 T21 R00s 0.006264 30.839238 - 0.006151 0.766728 V37898 V20 T21 R00s + 0.000936 4.347355 P 0.000047 0.069784 V37898 V22 T21 R00s 0.011973 32.129567 P/F 0.010924 1.170367 V37898 V24 T21 R00s + 0.001162 0.710093 P 0.000517 0.159932 V37898 V26 T21 R00s 0.014915 60.878358 0.022558 133.296099 V37898 V28 T21 R00s 0.046547 237.193588 0.089786 535.260893 V37898 V30 T21 R00s 0.096426 531.313742 0.204694 1211.953473 V37898 V32 T21 R00s 0.169542 957.846243 0.376147 2197.638863 V37898 V14 T21 R00s 0.038212 215.683156 0.048341 14.896814 V37898 V16 T21 R00s 0.047582 163.324157 0.076082 209.576995 V37898 V18 T21 R00s 0.029962 141.032598 0.031865 43.484285 V37898 V20 T21 R00s 0.013646 75.201418 0.017905 18.085157 V37898 V22 T21 R00s 0.003806 19.474233 0.006600 27.192221 V37898 V24 T21 R00s + 0.000837 0.753581 P 0.000062 0.038048 V37898 V26 T21 R00s 0.004222 23.266414 P 0.001244 0.163401 V37898 V28 T21 R00s + 0.000847 0.756855 P 0.000068 0.036377 V37898 V30 T21 R00s 0.004642 15.593690 P/F 0.000719 0.097340 V37898 V32 T21 R00s + 0.001966 1.218541 P 0.000386 0.123816 V37898 V14 T21 R00s + 0.000936 0.594796 * 0.000041 0.044266 V37898 V16 T21 R00s 0.014415 35.677406 - 0.006387 1.663523 V37898 V18 T21 R00s 0.012831 27.492452 - 0.002873 0.991272 V37898 V20 T21 R00s + 0.000593 0.463061 P 0.000165 0.590207 V37898 V22 T21 R00s 0.008213 49.328103 P 0.016993 4.203555 V37898 V24 T21 R00s 0.006291 24.991526 - 0.027248 3.126669 V37898 V26 T21 R00s + 0.007479 1.354242 P 0.000441 1.048194 V37898 V28 T21 R00s 0.016620 50.255582 0.043331 176.065652 V37898 V30 T21 R00s 0.042214 197.421942 0.175043 716.937254 V37898 V32 T21 R00s 0.088638 452.898638 0.394134 1664.255309 V37898 V14 T21 R00s 0.013953 7.597552 0.024858 8.713191 V37898 V16 T21 R00s 0.009149 23.147165 0.009208 11.724785 V37898 V18 T21 R00s + 0.001016 0.626030 P 0.000075 0.037386 V37898 V20 T21 R00s 0.007733 39.083756 P/F 0.012101 2.261516 V37898 V22 T21 R00s 0.004839 21.173559 P 0.001717 0.424589 V37898 V24 T21 R00s + 0.008065 1.253697 P 0.000293 0.060402 V37898 V26 T21 R00s 0.010128 36.700642 P 0.010547 1.288399 V37898 V28 T21 R00s 0.005177 23.699739 - 0.002734 0.366025 V37898 V30 T21 R00s + 0.001494 0.924237 P 0.000107 0.083630 V37898 V32 T21 R00s 0.009633 39.445291 0.021496 109.350817 The addition of bounding boxes slows testing slightly, from 0.45 to 0.55 seconds per 20,000 point series on a 750 MHz PC. Training time including path approximation is 0.3 seconds per path. Experimental Results - Poppet Blockage Generalization The next experiment shows that path modeling with bounding boxes is able to generalize over variations in poppet blockages. The three files are measurements at 28V, normal temperature with poppet blockages of 0, 4.5 mils, and 9 mils. In the first experiment ("outside"), path modeling is used with and without bounding boxes, training on 0 and 4.5 mils. In the second ("between"), training is on 0 and 9 mils. For nearest path modeling, parameters are N=40000, C=2, T=50, K=100, M=3, R=3, S=1, P=1 (same as above except for K=100 segments). For bounding boxes, the parameters are changed to N=20000, K=50, P=2. The reason for changing K is to equalize the self anomaly scores between the "between" and "outside" experiments to 50 segments per training path. (This adjustment was not necessary to show generalization in the voltage experiments above because we did not compare scores between different training sets). The table below shows the results. Using nearest path modeling, the anomaly score of the "between" 4.5 mils when trained on 0 and 9 mils is almost as large as the "outside" anomaly score of 9 mils when trained on 0 and 4.5. However, bounding box modeling shoes good generalization. The "between" anomaly scores are much lower than the "outside" scores. Nearest Path Bounding Box "Outside" Tr Max Total Max Total -------------------------- -- -------- -------- -------- -------- V37898 V28 T21 R00s + 0.000594 0.136680 0.000161 0.063024 V37898 V28 T22 imp045 R01s + 0.003982 0.301998 0.000037 0.026507 V37898 V28 T22 imp09 R01s 0.024843 3.743464 0.013263 1.712334 "Between" V37898 V28 T21 R00s + 0.000279 0.108364 0.000162 0.048330 V37898 V28 T22 imp045 R01s 0.007656 2.698493 0.002599 0.392329 V37898 V28 T22 imp09 R01s + 0.008340 0.660097 0.000023 0.023140 Conclusion Path modeling with bounding boxes is appropriate for time series anomaly detection when training cases for limiting normal cases are available, such as minimum and maximum voltage, temperature, pressure, etc. Unseen test series between the limiting cases will be classified as normal, while cases outside the bounds set in training will be classified as anomalous.